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Abstract 

^vq ■ While the use of volatilities is pervasive throughout finance, our ability to determine the in- 

stantaneous volatility of stocks is nascent. Here, we present a method for measuring the temporal 

Qs \ behavior of stocks, and show that stock prices for 24 DJIA stocks follow a stochastic process that 

describes an efficiently priced stock while using a volatility that changes deterministically with 

r^ ■ time. We find that the often observed, abnormally large kurtoses are due to temporal variations 

rH \ in the volatility. Our method can resolve changes in volatility and drift of the stocks as fast as a 



single day using daily close prices. 
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I. INTRODUCTION 

In this paper, we study the temporal behavior of the distribution of stock prices for 24 
stocks in the Dow Jones Industrial Average (DJIA). This is done using a new method of 
measuring changes in the volatility and drifts of stocks with time. When this method is 
applied to time-series constructed from the daily close of stocks, changes as fast as one 
day can be seen in both. Given that it is not possible to accurately measure (as oppose 
to predict) intraday changes in the volatility using only daily-close data, for two of the 24 
stocks we have been able to reach the maximum resolution (known as the Nyquist criteria) 
of one day in the rate that the volatility can change, while for the great majority of the 
remaining stocks, we have come within one day of this maximum. We believe that this 
method can measure changes in the volatility and drift that occur during the trading day 
as well if intraday price data is used. But even with only daily-close data, we have been 
extraordinarily successful at determining the temporal behavior of stocks in general, and of 
the volatility in particular, and in the process, we have furthered our understanding of the 
behavior of stock prices as a whole. 

We find that the stock prices of these 24 stocks can be well described by a stochastic 
process for which the volatility changes deterministically with time. On the one hand, this 
is a process where the yield at any one time is not correlated with the yield at any other 
time; the process thus describes an efficiently priced stock. On the other hand, this is 
a process where the predicted kurtosis agrees with the sample kurtosis of the stock; the 
process thus also provides a solution to the long standing problem of explaining how an 
efficiently priced stock can have a kurtosis that is so different from what is expected for 
a Gaussian distribution. Indeed, we find that abnormally large kurtoses are due solely to 
changes in the volatility of the stock with time. When this temporal behavior is accounted 
for in the daily yield, the kurtosis reduces dramatically in value, and now agrees well with 
model predictions. This finding is in agreement with Rosenberg's (1972) observation that 
the kurtosis for nonstationary random variables is larger than than the kurtosis of individual 
random variables. We have also determined changes in the volatility of these stocks, and 
for three of the 24 stocks, variations of as fast as one day can be seen. For another 16 
stocks, this temporal resolution was two days or less, and for only five of the 24 stocks is 
this resolution longer than 2.5 days. 



The behavior of the drifts for all 24 stocks can also be determined using this method, and 
with the same resolution as their volatility. We find that the drift for the majority of the 
stocks is positive; these drifts thus tend to augment the increase of the stock price caused 
by the random-walk nature of the stochastic process. This finding is not surprising, nor is 
it surprising that we find that the drift is much smaller than the volatility for all 24 stocks. 
What is surprising is that for three of the 24 stocks the drift is uniformly negative. For these 
stocks, the drift tends not to increase the stock price, but to depress it. That the stock price 
for these three stocks increase at all is because this drift is much smaller in the magnitude 
than the volatility. Over the short term, growth in the prices of these stocks — as they are 
for all 24 stocks — is due to a random walk, and thus driven more by the volatility than the 
drift. Indeed, this is the only reason that the prices of these stocks increase with time. 

Finally, the distribution of the stock prices for the 24 DJIA stocks has been determined. 
When the temporal variation in the volatility is corrected for in the daily yield, we find 
that the resultant distribution for all but four of the stocks is described by a Rademacher 
distribution with the probability that the yield increases on any one day being 1/2. For the 
four other stocks, the distribution is described by a generalized Rademacher distribution 
with the probability that the yield increases on any one day being slightly greater than the 
probability that it decreases. 

II. BACKGROUND, PREVIOUS WORK, AND A SUMMARY OF THE AP- 
PROACH 

In 2005, 403.8 billion shares were traded on the New York Stock Exchange (NYSE) with a 
total value of $14.1 trillion dollars (see NYSE). During the same period, 468 million contracts 
were written on the Chicago Board Options Exchange (CBOE) with a total notional value of 
$12 trillion dollars. At the NYSE, traders, investors, and speculators — big and small — place 
bets on the movement of stock prices, whether up or down. Profits are made, or losses 
are reconciled, based on the changing price of the stock. As such, great effort is made 
to predict the movements of stock prices in the future, and thus much attention — with 
attending analysis — is focused on the price of stocks. 

In the CBOE, traders, investors, and speculators write or enter into contacts to purchase 
or sell a predetermined amount of stocks at a set time in the future. Profits here are made. 



or losses reconciled, based on the degree of risk that the movement of the stock will be down 
when expected to be up, or up when expected to be down. Here, it is not so much the 
price of the stock that matters. It is the amount of volatility in the stock, and predicting 
how stock prices may move in the future is much less important. Indeed, the pricing of 
options — through the Black-Scholes equation and its variants — is based on the argument 
that it is not possible to predict how the price of stocks will change in the future. In this 
pricing, it is taken for granted that the markets are efficient, and that earning returns which 
are in excess of the risk-free interest rate is not possible. All is random, and the increase in 
stock prices seen is due to a simple random walk with a (small) drift. Great interest is thus 
paid in modeling the distribution of stock prices, and the application of these models to the 
pricing of options and derivatives. 

Given the $26.1 trillion dollars in trades and contracts in the NYSE and CBOE in 2005, 
it is not surprising that much effort has been expended in determining the properties of the 
stock market. Given the precipitous drop in stock market prices in October of 2008 — which 
occurred over period of days — accurate determination of how these properties change with 
time has become even more important. Since the work by Bachelier (1900) at the turn of the 
20th century, a great deal of these efforts have been focused on determining the distribution 
of the daily yields of stock prices (Osborne 1959a and Osborne 1959b). Inherent in this 
determination is determining the volatility of the distribution. Use of this volatility is now 
pervasive in modern finance, and is a critical ingredient in such endeavors as the pricing 
of options, the general assessment of risk and the determination the value of assets at risk, 
and the construction of optimal portfolios. That this effort continues today is indicative of 
the difficulty in determining this distribution, its importance in modern finance, and the 
financial impact that its determination can have. 

While Bachelier (1990) characterized the distribution as a random walk with the prices 
of the stock having a given drift and a constant volatility, it has been known since the 
detail analysis of the behavior of stock prices by Fama (1965) that the distribution of daily 
yields is only approximately Gaussian; the distribution calculated by Fama — which does 
not take into account variations in the volatility with time — has a fatter tail than expected 
for a Gaussian distribution. Indeed, it is typically found that the kurtosis can be as high 
as 100, while by comparison the kurtosis of a Gaussian distribution is only three. This 
discrepancy between the distribution of daily yields as they are traditionally calculated 



and the Gaussian distribution, while seemingly an inconsequential detail, nonetheless has 
wide-ranging consequences. 

Mathematics tells us that if the distribution of daily yields of a stock is a Gaussian 
distribution, then the daily yield on any one day cannot depend on the daily yield on 
any other. This is the Central Limit Theorem (CLT), and it is embodied in a number of 
ways — the various forms of the Efficient Market Hypothesis (EMH) (see Fama 1970 and 
Fama and French 1988), and the no-arbitrage condition — in modern finance. This lack 
of predictability is one of the underlying assumptions used in the pricing of derivatives. 
Mathematics says we can also turn the statement of the CLT around, however. Namely, 
if the daily yield on any one day does not depend on the daily yield on any other, then 
the distribution of daily yields must necessarily be a Gaussian distribution as long as the 
number of days used in its determination is large enough, and as long as the distribution is 
well behaved. 

In the face of this mathematical result, there are two possibilities. The first possibility is 
that the distribution of daily yields for stocks is not Gaussian. The daily yield on one day 
does depend on the daily yield on some other day, and it is possible, in principle, to predict 
future stock prices by looking at historical prices. The second possibility is that the EMH 
nevertheless holds, and there are good, albeit unknown, reasons for the unexpectedly large 
kurtosis. The situation is further muddied when the autocorrelations of the daily yield of 
stocks are calculated. It is well known from these calculations that the value of the daily 
yield on different days are uncorrelated with each other, and we have seen this behavior for 
the stocks studied here as well. This independence extends also to other asset classes, as 
shown by Kendall (1953). 

There have been numerous attempts at using other distributions — the Levy and its gener- 
alization, the Pareto, proposed by Mandelbrot (1963), the Student t-Distribution proposed 
by Blattberg and Gonedes (1974), and the discrete mixture of Gaussian distributions model 
proposed by Kon (1984) — to describe the distribution of stock prices (see Toyli, Sysi-Aho, 
and Kaski 2004 for an overview and assessment). These attempts are based on the belief 
that the second of the two possibilities holds, and that the reason for the overly large kur- 
tosis is because the distribution used to describe the stock was not correct. As such, for 
these distributions the daily yield on any one day also does not depend on the daily yield on 
any other day, and the consequences of the CLT is instead evaded in various ways. These 



approaches have had various degrees of success. For example, while the Pareto distribution 
does have a fatter tail than the Gaussian distribution and has a kurtosis that can agree 
with observations, all moments with order greater than an integer k — which determines the 
power-law behavior of the distribution — is ill defined; in this way, the distribution is not 
well behaved, and thus does not fall within the class of distributions for which the CLT is 
applicable. For the Levy distribution, the volatility itself (as well as all higher moments) is 
ill-defined, requiring the truncation of the distribution to price options using this model, as 
described in Kleinert (2002). The Student-t Distribution differs significantly from the Gaus- 
sian distribution only when the number of data points are small (thereby evading the CLT), 
which begs the question of what happens when this distribution is applied to time-series with 
more than, say, 200 terms in it. Kon's model is a of mixture of Gaussian distributions, and 
thus the moments of his distribution are all finite. However, while the model is effective at 
describing the large kurtosis of stocks, it is nonetheless an empirical model; the origin of the 
discrete mixture is not known, and the parameters used in its construction are determined 
only after the model is fitted to the stock data. 

Our approach is also based on the belief that the second of the two possibilities hold. But 
unlike the previous attempts at describing the distribution of stock prices mentioned above, 
we find that the underlying reason for the overly large kurtosis is because time variations 
in the distribution of stocks have not been properly taken account of. As observed by 
Rosenberg (1972), it is often assumed that the distribution of stock prices being analyzed 
does not change during the period of interest. This assumption was certainly made for all 
the models described above. In contrast to these approaches, we will take time variations 
in the distribution explicitly into account. Doing so results in a distribution that can both 
explain the abnormally kurtosis, and still have the property that the yield on any one day is 
not correlated with the yield on any other. In the process, we will also be able to determine, 
for the first time, how the volatility and the drift changes instantaneously with time. 

That the volatility of stocks changes with time is not a new observation. This behavior 
has been known since at least the work by Osborne (1962) (see also Lo 1988), and ana- 
lyzed explicitly by Rosenberg (1972). Indications of this have been reported by many others 
since then (Ball and Torous 1985, French and Roll 1986, Conrad and Kaul 1988, Andersen 
and Bollerslev 1997, Kullmann, Toyli, Kertesz, Kanto, and Kaski 1999, Nawroth and Peinke 
2006). Much effort has since been made to determine how this volatility — and thus necessar- 
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ily how the distribution — changes with time, with the main focus of this effort on extending 
the usual random walk description of stock market prices. This has lead to the introduction 
of the jump-diffusion model proposed by Merton (1976), where discrete, random jumps in 
the prices of a stock in time are incorporated in continuous stochastic processes, and to 
stochastic volatility models developed over a number of years by Praetz (1972), Christie 
(1982), Hull and White (1987), Scott (1987) and Heston (1993) where the volatility itself 
is modeled as a stochastic process with its own drift and volatility (see also Muzy, De- 
lour, and Bacry 2000 for a multifractal-inspired, stochastic volatility model). However, as 
it was pointed out by Hull and White (1987), methods for directly measuring time- varying 
volatilities were not, at the time, known. 

This inability to directly measure variations in the volatility has greatly constrained efforts 
in studying how the volatility of real market data varies with time. To a great extent, it has 
also driven the development of stochastic volatility models. By characterizing the volatility 
as a stochastic process, a time varying volatility can be modeled using a comparatively 
simple choice of a constant drift and a constant volatility for the process. Even then, 
however, parameters in stochastic volatility models are determined not by a direct analysis 
of the daily yields of stock prices, but are instead determined indirectly. Namely, the price 
of an option for a stock is calculated for the process in terms of a set of model parameters, 
and these parameters are then set by adjusting their values until the calculated price agrees 
with the market price of the option. 

The inherent difficulty in determining from market data how a distribution changes with 
time is described in Boyle and Anathanarayanan (1977), and is straightforward to under- 
stand. To determine the distribution of a stock, a collection of stock prices is required; 
the larger the collection, the better. Since stock prices change sequentially in time, this 
collection has to be done over a period of time, and because a relatively large collection 
is needed, this period must be correspondingly long. For example, most distributions are 
calculated using the daily close of a stock, if for no other reason then because these prices 
are readily available in the public domain. If the collection of prices used is as large as 500 
daily closes, then the stock prices in this collection must span a period of nearly two years; 
Fama (1965), for example, used stock prices that span a period of up to six years in his 
analysis. Using a collection of 500 stock prices to determine the distribution of the stock 
through standard methods means that one is tacitly assuming that the price of the stock 



two years ago belongs to the same distribution — with the same volatility — as the price of the 
stock today. This strains credibihty, especially given the rapid movements in the markets 
during the last quarter of 2008. While it is possible to calculate the distribution with a fewer 
number of stock prices and thus shorten the period of time over which they are collected, 
statistical errors inherent in determining the distribution are proportional to 1/viV, where 
N are the number of data points in the sample, and will thus be correspondingly larger. At 
one point, the period of time would be so short that we would not be able to say whether 
distribution is Gaussian or not. Indeed, we only have to look at the extreme case where the 
period is so short that there are only three stock prices collected over three days, resulting in 
only two daily yields to determine the whole distribution; this clearly cannot be done with 
any certainty whatsoever! (This inherent difficulty has lead to the development of other 
approaches to calculating volatility such as those found in Ball and Torous 1984, Parkinson 
1980, and Longin 2005) where the number of daily close needed is reduced.) 

Mathematics does not require that the distribution remains constant. The general theory 
of stochastic processes allows for volatilities that change with time. In fact, we will show 
below that even though the volatility of a Gaussian distribution may change with time, the 
daily yield on any one day need not depend on the daily yield on any previous day; the 
EMH still holds for this case. Instead, what has been lacking up to now is a method for 
calculating the statistical properties of a stock when the volatility changes with time. This 
we have been able to do. 

Our approach is based on the observation that when the volatility depends solely on time, 
we can remove the time dependence of the distribution by dividing the daily yield by the 
volatility. This standardizes the daily yield, and a Gaussian distribution with a time- varying 
volatility is mapped into a Gaussian distribution with unit volatility. The volatility of this 
distribution is clearly constant, and thus the standardized daily yields all belong to the same 
distribution. The inherent difficulty in determining a distribution that changes with time 
mentioned above is thus circumvented. Indeed, large collections of stock prices are now a 
benefit — they result in smaller standard errors — and not a detriment. That the volatility 
of the mapped distribution is known then allows us to determine how the volatility of the 
original daily yield changes with time. In addition, it is readily apparent from our analysis 
that the distribution of standardized yields is equivalent to a special case of the binomial 
distribution, and this observation allows us to extract easily the temporal behavior of the 

8 



drift of the yield as well. 

This approach is straightforward, and at its heart resembles the process one goes through 
in using a table of values for the cumulative standardized Gaussian distribution: The random 
variable at hand is scaled with its volatility to get the standardized Gaussian distribution 
with unit volatility. The difference is that in our case the volatility is not know a priori; 
it must be determined. This is done using a combination of statistical methods, Fourier 
analysis, and signal processing techniques. While prevalent in other fields, many of these 
techniques are not commonly found in the finance or business literature, and it would be easy 
to become too involved with the mathematics while neglecting the finance when presenting 
our results. To avoid doing so, we will focus on finance in the main body of the paper, and 
when our model of stock prices is constructed, it will be motivated by, and justified with, 
an analysis of the time-series of the stocks at hand. Importantly, a validation of each step 
taken will made. Only enough of the underlying mathematical analysis needed to explain 
the essential ideas behind our approach will be presented in the main body of the paper; we 
will refer the reader to the appendices for many of the details. Our analysis will be applied 
explicitly to Coca Cola stock in this paper to demonstrate the underlying ideas behind the 
approach. This stock is chosen out of the 24 because for our purposes its underlying behavior 
is representative of all the others. Analysis of the other 23 stocks studied here follow in much 
the same way, and we will only present a summary of the results for them, along with graphs 
of the volatilities for all 24 stocks as a function of trading day. 

III. MODEL VALIDATION AND OUR CHOICE OF STOCKS 

It would not be an exaggeration to say that the only characterizations of a stock that 
is not model dependent is the price per share that it was sold at, the day and time it was 
sold, and the total number of shares of the stock that was sold over a given period of time. 
These are the only characterizations that are objective and verifiable, and for whom all can 
agree on how they are obtained. The distribution of the daily yields of a stock certainly is 
not, and herein lies the problem: How should any model of stock prices be validated? 

To see how difficult the problem of validation is to resolve (this issue was explicitly stud- 
ied by Magdon-Ismail and Abu-Mostafa 1998 for volatility models), consider the volatility 
of the 24 stocks considered here. As we will calculate the volatility of these stocks, it would 
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seem that a comparison of the volatihties we obtain here with the volatihties calculated 
using any one of the many other approaches in the literature would be an effective way of 
assessing the validity of our model. However, irrespective of the approach taken to make 
this calculation, assumptions about the behavior of the stock will have already been made. 
The historical volatility, for example, uses a moving average to calculate the volatility on 
any given trading day. It implicitly assumes that the volatility does not change significantly 
over the window of time used when calculating the average, and thus cannot effectively 
measure changes in the volatility that occur within this window. The implicit volatility, 
developed over a series of papers by Latane and Rendleman (1976), Schmalensee and Trippi 
(1978), and Beckers (1981), can measure instantaneous changes in the volatility, but it is 
calculated by inverting the Black-Scholes (or any other) equation for pricing options, and 
thus implicitly assumes that the particular pricing equation used accurately prices the op- 
tion at any given time. Autoregression approaches to calculating the volatility — such as 
the exponentially weighted moving average (EWMA), the autoregregressive conditional het- 
eroskedasticity (ARCH) proposed by Engle (1982), the generalized autoregregressive condi- 
tional heteroskedasticity (GARCH) proposed by Bollerslev (1986), and a new approach that 
combines autoregressive and Fourier (spectral) analysis techniques proposed by Bollerslev 
and Wright (2001) — are designed more to manage volatilities that change with time than 
to characterize them. They depend on one or more parameters that must subsequently be 
set using some property of a stock, and are not designed to determine how the volatility 
changes. Stochastic volatility models explicitly consider volatilities that change (randomly) 
with time, but to determine how this volatility changes, the approach adjusts the parame- 
ters that determine the volatility until the predicted option prices agree with market prices 
(see Lamoureux and Lastrapes 1993 for a test of this approach). Using a comparison of 
volatilities to validate models is therefore more a comparison of the underlying models of 
the market or methods of calculation than it is of the volatilities themselves. Indeed, the 
question of which approach to calculating the volatility is the better one is one that has 
been address many times over the years (see Day and Lewis 1992, Canina and Figlewski 
1993, Jorion 1995, Figlewski 1997, Andersen and Bollerslev 1998, Chong, Ahmad, and Ab- 
dullah 1999, Szakmary, Ors, Kim, and Davidson II 2003, and McMillan and Speight 2004), 
apparently without consensus. 

This difficulty in validating models is particularly inopportune here. While many of the 
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techniques we have used in this paper has been long used in other fields, our approach in 
this paper is novel, and have not been used in the analysis of stock market prices before. 
We therefore take a particularly stringent approach to validating our model. First, the 
model must be able to explain the observed properties of the stocks. This we accomplish 
by construction. Properties of the stock price are presented first, and the model is then 
constructed explicitly to describe them. Second, the model must be self-consistent, and must 
be able to predict some property the stock price, which can subsequently be verified. All 
models of stock market prices make a certain set of underlying assumptions about properties 
of the price; these assumptions have consequences. These consequences can in turn be used 
to predict properties of the stock price that can then be used to validate it. For our model, 
the distribution of standardized daily yields is described by a Rademacher distribution or 
its generalization. This distribution gives specific values for the population skewness and 
kurtosis, and they provide a simple and statistically meaningful approach to validating our 
model. Specifically, we calculate the sample skewness and kurtosis from each stock's time- 
series. We then compare this sample skewness and kurtosis to the population skewness 
and kurtosis predicted by our model. If the sample skewness and kurtosis agree with the 
population skewness and kurtosis of our model at the 95% confidence level (CL), we assert 
that our model is valid. In fact, we find that this agreement holds for all 24 stocks considered 
here, and it does so over the whole of the time period spanned by their time series. Indeed, 
for a number of the stocks, this period spans over 80 years. 

It is because of this operational approach to validating our model that we chose to 
analyzed stocks from the DJIA. First, all the stocks in the DJIA are large caps, and have 
a large daily trading volume; they are precisely the type of stocks for which we expect the 
market to be efficient. They are in this way similar, and we would expect they can be 
described by the same type of model. Second, each of these companies has been publicly 
traded for a number of years. We therefore have access to a large collection of daily close 
prices for these stocks with which to construct their time-series. These time series, for 
example, range in time from as short as 5,090 trading days for Citigroup, to as long as 
21,527 trading days for Exxon-Mobil. The availability of a large sample of daily close is 
particularly important as we will be numerically assessing the validity of each step in the 
construction of our model. With such large collections of stock prices, standard errors in 
our calculation can be as small as 0.7%, and as such, we are able to say with a great deal 
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of certainty whether or not our approach is self-consistent. Third, the 24 chosen were the 
simplest, in terms of their ownership, of the 30 stocks listed in the DJIA. The six DJIA 
not chosen were recently involved in mergers or acquisitions, which introduces unwanted 
complications; an assessment of the temporal of these stock prices may not be as clear cut 
as the 24 stocks considered here. 

A detailed description of how the time-series are constructed is given in Appendix lAl 
where any particularities in the analysis of stocks are listed as well. A list of these stocks 
given in terms of their stock symbol is presented in Table I along with the starting date 
of the time-series and the total number of daily yields in each. The ending date for all 24 
time-series is December 29, 2006. 

IV. A TEMPORAL MODEL OF STOCK MARKET PRICES 

We begin our study of the temporal behavior of stock market prices with an application 
to finance. Specifically, for the 24 stocks considered here we study whether the daily yield 
on December 29, 2006 depends on the daily yield on any day previous to it. This property 
of the market, which has direct implications in finance, will be used as the starting point 
for the construction of our model of stock prices. 

A. An Inherent Contradiction 

Shown in Fig. [1] is a graph of the autocorrelation function of the daily yield for Coca Cola 
using Eq. (IB6p from the Appendix [Bl This autocorrelation is calculated between the daily 
yield of the stock on December 29, 2006, and the daily yield T days before the 29th. The 
graph thus shows the dependency of the yield on the 29th on the yield on any previous day. 
If the yield on the 29th depends on the yield on day T, then the autocorrelation function 
will not vanish on that day at the 95% CL. If, on the other hand, the yield on the 29th 
does not depend on the yield on day T, then the autocorrelation function will be within 
statistical error of zero. 

Also shown on the graph in Fig. [T] is the errorbar for each of the calculated values of the 
autocorrelation function. These errorbars are set at the 95% CL, which is 1.96 times the 
standard error calculated using Eq. fIBllI) for the autocorrelation function on that day. They 
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FIG. 1: The autocorrelation of the daily yield for Coca Cola is shown in the main figure, with 
the time, T, labeling the number of trading days before December 29, 2006. Also included at each 
data point are errorbars set at ±1.96 times the standard error. In the insert, graphs of the sample 
skewness and the kurtosis of the stock calculated using a 251-day moving average are shown. 

thus set the 95% confidence interval (CI) about the calculated value for the autocorrelation 
function. If the value of the autocorrelation function falls within its errorbar of zero, there is 
a 95% probability that the autocorrelation on this day equals zero. With 21,522 total trading 
days in the time-series for Coca Cola, the standard error for the values of the autocorrelation 
function shown in the graph is roughly 0.7%, and is thus quite small; the errorbars shown 
are correspondingly small. The standard error for the majority of the stocks studied here 
are equally small. 

All but one of the errorbars for the autocorrelation shown in Fig. [1] straddles the x-axis. 
As such, we can say that the value of the autocorrelation function for T > is within a 95% 
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CL of zero for all but one day. Indeed, when we continue this calculation all the way back 
to December 31, 1925, the starting date for the time-series, we find that the autocorrelation 
function for the daily yield on 20,279 out of a total of 21,522 trading days fall within the 
95% CI of zero; the autocorrelation function on 1,243 trading days, or 6% of the trading 
days, fall outside of the 95% CI (see Table I). This does not necessarily mean that there is a 
correlation between the 29th and these 1,243 trading days, however. Statistically, we would 
expect values of the autocorrelation function to exceed the 95% CI on 5%, or 1,076, of the 
trading days. We can only conclude that on at least 1%, or 215, of the trading days the 
autocorrelation function does not vanish for T > 0. If instead a 99% CI in chosen, we find 
that the value of the autocorrelation function falls within the 99% CI of zero for 21,172 out 
of 21,522 trading days; they fall outside of the 99% CI on only 2%, or 350, of the trading 
days. We can therefore still conclude that for at least 1%, or 215, of the trading days the 
autocorrelation function may not vanish for T > 0. 

The autocorrelation function of the daily yield for all 24 stocks have been calculated for 
the length their time-series, and we have found that the autocorrelation function for these 
stocks behave similarly to Coca Cola's. Namely, the autocorrelation function is maximum 
at T = 0, and it does not vanish for at least 1% to 3% of the trading days for each stock; for 
Citigroup and Verizon, this percentage is even lower. We may conclude from this analysis 
that for the vast majority of the time the daily yield of these stocks on any one day is not 
correlated with the daily yield on any subsequent day; the market is thus extremely efficient 
for these 24 stocks. In addition, we will show below that for the 1% to 3% of the trading days 
when the autocorrelation function does not vanish, this is due to changes in the volatility of 
the stock with time, and not to correlations between daily yields. 

Based on the above analysis, it would seem that the usual stochastic process with a 
constant volatility would be a good model for these stocks. The lack of dependence of the 
daily yield on any one day from any other is precisely the property inherent in such a model. 
There are, however, other properties of the distribution of daily yields for stocks that any 
model would have to explain as well, and it is here that the constant-volatility model of 
stocks is lacking. 

Shown in the insert of Fig. [1] is the sample skewness of the daily yields for Coca Cola 
calculated using a 251-day moving average. If indeed the stock price of the stock is well 
described by a stochastic process with a drift and a constant volatility, then we would expect 
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the skewness of the daily yield to be zero. For Coca Cola, we find that the skewness ranges 
from —5.3 ±6.5 to 4.4 ±5.3. Although the skewness is large in magnitude, its standard error 
is correspondingly large, and we find that the skewness exceeds the 95% CI of zero on only 
943 out of 21,022 days, or 4%, of the time. Thus, the sample skewness calculated using a 
251-day moving average agrees with what is expected from modeling the yield of the stock 
using a stochastic process with constant volatility. 

The situation is quite different for the kurtosis, however. Shown also in the insert of 
Fig. [1] is the sample kurtosis of the daily yields for Coca Cola calculated with the same 
251-day moving average. Although the kurtosis for a daily yield described by a stochastic 
process with a constant volatility is expected to be three, what we find instead is that the 
sample kurtosis calculated for the Coca Cola time-series ranges in value from 2.92 ± 0.22 to 
122 ± 49. Like the skewness, the standard error for the kurtosis is large when the kurtosis 
is large, but unlike the skewness, the error is not overwhelmingly large. We find that the 
kurtosis exceeded the 95% CI of three on 15,393 out of 21,022 days, or 72%, of the time. 
For the great majority of the trading days in the time-series, the kurtosis is different from 
that expected for a stochastic process with constant volatility. 

We have done this calculation for all 24 stocks, and these results are not unique to Coca 
Cola. This, then, is the contradiction inherent in using a stochastic process with constant 
volatility to model stock market prices. On the one hand, calculations of the autocorrelation 
function indicate that the market is extremely efficient for these stocks, which is consistent 
with a stochastic process with constant volatility. On the other hand, calculation of the 
kurtosis for these stocks are much larger than expected for such a process. We will use this 
contradiction to guide the construction of our model in the analysis below. 

TABLE L Autocorrelations for the 24 DJIA Stocks 



Starting 


Nt 


Daily Yield Standardized Daily Yield 


Date 


> 95% CI > 99% CI > 95% CI > 99% CI 



C 


10/29/86 


5088 


MSFT 


03/13/86 


5248 


vz 


02/16/84 


5770 



227 (4%) 53 (1%) 256 (5%) 45 (1%) 

221 (4%) 63 (1%) 292 (6%) 60 (1%) 

236 (4%) 58 (1%) 306 (5%) 57 (1%) 
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Starting 


Daily Yield Standardized Daily Yield 


Date Nt 


> 95% CI > 99% CI > 95% CI > 99% CI 



INTC 

AXP 

AIG 

WMT 

HPQ 

DIS 

AA 

MRK 

MMM 

JNJ 

PFE 

BA 

CAT 

PG 

GE 

CM 

DD 

MO 

IBM 

KO 

XOM 



12/14/72 
12/14/72 
12/14/72 

11/20/72 
03/03/61 
11/12/57 
06/11/55 
05/15/46 
01/15/46 

09/25/44 
01/17/44 
09/04/34 
12/02/29 
08/12/29 
12/31/25 

12/31/25 
12/31/25 
12/31/25 
12/31/25 
12/31/25 
12/31/25 



8592 516 
8592 412 
8592 470 



8608 
11524 
12386 
14014 
15444 
15544 



458 
610 
741 
688 
946 
733 



15920 759 

16126 812 

18946 1281 

20358 1691 

20442 1644 

21518 1474 

21518 1370 

21520 1520 

21520 1804 

21522 1387 

21522 1243 

21526 1498 



(6%: 


159 


;2%: 


466 


;5%; 


93 


1%) 


(5%: 


101 


;i%: 


391 


;5%; 


78 


1%) 


(5%: 


137 


;2%: 


470 


;5%; 


96 


1%) 


(5%: 


118 


[1%] 


415 


;5%; 


85 


;i%) 


(5%: 


169 


[1%] 


575 


;5%; 


112 


;i%) 


(6%: 


182 


[1%] 


624 


;5%; 


135 


1%) 


(5%: 


153 


[1%] 


710 


;5%; 


158 


1%) 


(6%: 


233 


[2%] 


835 


;5%; 


190 


1%) 


(5%^ 


253 


[2%] 


724 


;5%; 


145 


1%) 


(5%: 


163 


[1%: 


734 


;5%; 


145 


1%) 


(5%: 


181 


[1%] 


812 


;5%; 


181 


;i%) 


(7%: 


360 


[2%] 


980 


;5%; 


200 


1%) 


(8%: 


702 


[3%] 


1027 


;5%; 


239 


1%) 


(8%: 


678 


[3%] 


1026 


;5%; 


223 


;i%) 


(7%: 


545 


[3%] 


1104 


;5%; 


242 


1%) 


(6%: 


488 


'2%] 


1120 


;5%; 


251 


1%) 


(7%: 


532 


[2%] 


1071 


;5%; 


215 


1%) 


(8%: 


762 


;4%: 


1047 


;5%; 


242 


1%) 


(6%: 


461 


;2%: 


1095 


;5%; 


193 


1%) 


(6%: 


350 


;2%: 


1124 


;5%; 


230 


1%) 


(7%: 


477 


;2%: 


1096 


;5%; 


220 


1%) 



B. The Continuous Model 

In this section, we show that for continuous stochastic models of stock prices with a 
deterministic volatility that changes with time, the yield at time, t, does not depend on the 
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yield at any other time, t' . Such a stock price is thus able to model the properties of the 
autocorrelation function found for the 24 stocks above. In a later section, we will show that 
the time-variation in the volatility can also explain the abnormally large sample kurtosis. 

Take as the price of the stock at any time, t, the continuous function S{t). This is an 
approximation, of course. Stocks are bought and sold in discrete time periods, and the 
prices of these transactions are always recorded in discrete units. It is, however, easier to 
develop an understanding of the model, and to show a number of properties of it, using this 
continuous approximation instead of using a discrete time-series of stock prices. In the next 
section, when we develop a recursion relation for the volatility, we will consider real-world 
data, and will discretize the continuous model presented here. 

Our model for S{t) is a stochastic process with a drift, /i(t), and a volatility, cr(t), that 
change only with time: 

^^ = Kt)+'Timt), (1) 

where ^(t) is a Gaussian random variable such that 

E[at)] = 0, and Emat')] = S{t " t'). (2) 

Here, E[^] is the expectation value of ^ over a Gaussian distribution, and S{t) is the Dirac 
delta function. We emphasize that while Eq. ([T]) may have a form that is similar to various 
stochastic volatility models of the stock market, for us a{t) is a deterministic function of 
time; it does not have the random component that is inherent in stochastic volatility models. 
As usual, it is more convenient to work with u{t) = ln[S'(t)]; for continuous compounding, 
du/dt is then the instantaneous yield of the stock. In terms of u(t), Eq. ([1]) reduces to 

^ = ^(t) + ait)at), (3) 

where /i(t) = Jl{t) - a'^{t)/2. 

It is straightforward to show that for this stochastic process the instantaneous yield at 
time, t, does not depend on the yield at any other time, t' . To do so, consider the expectation 
value 



E 




Kt') 



E[a{t)a{t')mm]- (4) 



Because a{t) is a deterministic function, it can be moved outside the expectation value so 
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that E[a{t)a{t')^{t)^{t')] = a{t)a{t')E[Cit)C{t')]. Using Eq. (^, we then conclude that 



E 




t' 



a{tyS{t-t'), (5) 



so that the autocorrelation function of the instantaneous yield vanishes unless t = t'; the 
yield of the stock at any one time does not depend on the yield at any other time. Our 
model thus describes a market for the stock that is efficient. This is to be expected. At 
each instant, t, Eq. (|3]) describes a Gaussian distribution with drift, /i(t) and volatility, cT(t), 
and it is well known that for a Gaussian distribution the daily yield on any one day is not 
correlated with the daily yield on any other. 

Note that if the volatility was a function of u as well as t, or if it was itself a stochastic 
process, as it is taken to be in stochastic volatility models, we could not have moved the 
volatilities outside the expectation value to obtain Eq. ([5]). In these cases, it is not clear 
whether the yield of the stock at any one time depends on the yield at any other time. 

Formally, the solution to Eq. (|3]) is straightforward. If o"(t) > for all t, divide through 
by cr(t), and then reparametize time by taking 

r = / (7{s)ds. (6) 

Jo 

Equation ([3]) then simplifies to 



where 



^ = Mr) + ^(r), (7) 



/x(r(t))-/i(i)Mt)=0, (8) 



and ^ is still a Gaussian random variable, but now in r. Equation (ITj) is simply a stochastic 
process with drift fl{t) and unit volatility; its solution in terms of r is well known. The 
solution to the original equation, Eq. (|3]), can then be obtained, at least in principle, by 
integrating Eq. ([6]), and then replacing r with resulting function of t. 

In practice, our task is much more difficult. We are not given a cr{t), and then asked to 
find the price, S{t), of the stock at subsequent times. We are instead given a collection of 
stock prices collected over some length of time, and then asked to find the volatility. This 
is a much more difficult problem, but surprisingly, it is a solvable one, as we will see in the 
next subsection. 
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The Drift for the Standardized Yield 

FIG. 2: A comparison between the distribution of daily and standardized daily yields for Coca 
Cola is given in Figs. 2a and b. The binomial behavior of the standardized daily yield can clearly 
be seen in Fig. 2b. The resultant drift for the standardized daily yield is shown in Fig. 2c. 

C. The Discrete Process and a Recursion Relation for a{t) 



In this subsection, we derive a recursions relation that is used to solve for the volatility 
as a function of time. This derivation is most conveniently done using a discretized version 
of the continuous stochastic process Eq. ([3]) considered above, and we consider Sit) as a 
continuous approximation to the discrete time series, Sn, for n = 1, . . . , Nt, of stock prices 
collected at equal time intervals, a; this a is usually taken as one trading day. The subscript 
n enumerates the time step when the price of the stock was collected, and is an integer that 
runs from to the total number of data points, Nt- As such, t = na, T = Nxa, Sn = S{na), 
Un = u{na), and o"„ = a{na) is the volatility at t = na. The instantaneous daily yield is 
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then 

du Un - Un-1 _ AUn .„s 

-77 ~ = J {^) 

at a a 

where n > 1. It is clear that Am„ = ln(S'„/S'„_i) is the yield of S(t) over the time period a; 
when a is one trading day, Au„ is the daily yield. 

Our task in this section is to determine o"„ given the time-series 5'^, and we do so by 
making use of the analysis in the previous section. We call 

^ ^ ^, (10) 

a aan 

the standardized yield of the stock price over a time period a, and if a is one trading day, 
we call it the standardized daily yield. Since 

1 du ^ AUn .V 

a{t) dt aOn ' 

then from the discretized versions of Eqs. ([7]) and ([8]), we see that the distribution of stan- 
dardized yields has a volatility of 1/a, or one, if a is set to one trading day. The collection 
of standardized yields has a known volatility. 

Consider now a subset of the time-series with A^ < Nt elements, and the corresponding 
collection of standardized yields, A-u^, where n runs now runs from 1 to A^. Because this 
subset was arbitrarily selected from a collection of standardized yields that has a volatility 
of 1/a, this subset must also have a volatility of 1/a. As such 

^ = ^ V (^^"" ^ V ^^^\ no^ 

j N-l^\aa^ N^^aa^) ' ^ ^ 

where we have included for completeness in Eq. flT2|) the standard error for the volatility 
given A^ data points (see Stuart and Ord 1994) to emphasize that the accuracy of Eq. (IT2ll 
depends on A^. Equation (IT2l) must be true for each A^ < Nt- In particular, it must hold 
for A^ — 1, and thus we can write 

^EI^-^E^f. (13) 





N — 2 ^-^ \ aan N ^-^ aa, 

n=\ \ " ra=\ 



m 



which is similar in form to Eq. ( [T2|) . This self-similar property of the distribution is used to 



determine o"„, as we show below. 
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We first expand Eq. (fT2|) . and single out the n = N terms 

2 , N-1 



1 1 fAuN 

' + 



a N -1 \ aoN J N 



n=l 




N f 1 AuN N-l ^Au^^ 
'N-i[n aaN N^ '^, aal ' ^ ^ 



m 



m=l 

where we have dropped the error terms in Eq. flT2|) for clarity. Using Eq. f lT3|) in the second 
term of Eq. ( 1T^ and completing a square, we arrive at a surprisingly simple equation for 

N \ 1 f Aun 1 ^ Au. 



N — 1 I a \ aaN N -^ aa. 

' ' m=l 



4E^ ■ (i«) 



This is easily solved to give, 



a;vv^ = A^;v^^^v^±W^ , (16) 



Tn=l 

where the sign of the root must be chosen so that cr„ > for all A^. The standardized 
yield, AitN, can then be calculated using Eq. ( TTOl) for each time step. Equation ( IT6l) gives a 
recursion relation for (Xn. 

A recursive approach to calculating the volatility similar in spirit to the one above is 
described in Stuart and Ord (1994). That calculation is for volatilities that do not change 
with time, however, while in ours the volatility can do so explicitly. As we will see below, 
this introduces a number of complications. We note also that Eq. ( !T6|) differs markedly from 
autoregression approaches such as the EWMA, ARCH, and GARCH in that ctat depends 
nonlinear ly on Aujsf. 

Equation flT6|) gives a first-order recursion relation for cr„, and thus given an initial Ui, 
the values for o"„ for n > 1 is determined. To determine this initial o"i, we note that in 
the continuous process Eq. ([8]) holds. A similar relation must hold for the discretized yields 

AUn- 

To determine this relation, we follow the same approach that led to Eq. (TT2l) . and consider 
the following function 

r _ 1 v^Am„ 1 (l ^ AuA , . 

The first term in Eq. (TT7I) is the average of the standardized yield over the first A^ terms 
in the time-series, and it corresponds to the discretization of the first term in continuous 
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constraint Eq. (|8]) . The second term is the quotient of the average daily yield calculated over 
the same period with the volatility evaluated at the end of this period, and it corresponds to 
the discretization of the second term in continuous constraint Eq. ([H]) . If o"i can be chosen 

so that the mean of /tv, 

1 ^ 
]7E/-' (18) 

can be minimized to zero at the 95% CL, then Eq. ([8]) will hold on average for the dis- 
cretized yield. As usual, the 95% CL for this mean is calculated through the standard error, 
[D{fN)/NTYl\ where 

-' N=l \ -' M=l J 

is the standard deviation of j^. 

We have successfully applied the recursion relation, Eq. f lT6|) . to the 24 DJIA considered 
here, and have obtained for each stock time-series for cr„ and A-u^. This was done by de- 
termining the quotient Ami/cti through an iterative search algorithm that was implemented 
with a simple C++ program. This algorithm searches for a a\ that drives the mean of /at 
to zero while in the process minimizing Di^jjq). In addition, since the volatility must be 
non-negative, this search is done under the constraint that all calculated values for On must 
be greater or equal to zero, and it was stopped once the mean of j^ has been calculated to 
sufficient accuracy. 

A Ami/o"! that minimizes /at while at the same time giving a non- negative value for the 
volatility can be found for all 24 stocks. Indeed, we found that the mean of /at can be driven 
as close to zero as needed. The results of this calculation is given in Table II, which lists for 
each stock the value of Ami/cti, the mean of /at for this Ami/cti, and the standard error of 
the mean. While values for Ami/cti is only given to an accuracy of 10^''' — which is sufficient 
given the accuracy of the S'„ for the stocks as noted in Appendix [A] — we have been able to 
drive the mean value of /at to as far down as 10^^^ by increasing the accuracy of Ani/o"i to 
10~^^. It is clear from the standard errors given in Table II that the mean of /at vanishes 
within standard error at the 95% CL. This validates the recursion relation for o"„ for all 24 
stocks. 

Implicit in the derivation of Eq. ( !T2|) is that A^ is large, and yet since oj^ starts at some 
initial point ai, ctat are necessarily generated at small A^. We would thus expect that there 
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is a transient interval marked by some Ntran < Nt for which the solution to Eq. (fT6|) for 
A^ < Ntran is markedly different from the solution when N > Ntran- This is seen. For 
all 24 stocks, the behavior of Am^ for n near one is different than its behavior for large 
n. This difference is similar for all of the stocks, indicating that it is due to the recursion 
process itself, and not to any underlying behavior of the markets. We would thus hesitate 
to use the calculated values for o"„ when n < Ntran to draw conclusions about the behavior 
of the stock. The length of this interval, Ntran, varies from stock to stock, but typically 
ranges between 100 to 400 trading days. Given that the shortest time-series considered here 
contains 5,088 trading days, this interval is extremely short for all 24 stocks, and is not 
relevant in practice. This is yet another reason why we have chosen stocks that have a long 
track record to analyze. 

TABLE II: Determining ai for the 24 DJIA Stocks 





Aui/ai 


Mean f^ 


SE for In 


GE 


-1.29986151 


3.70x10-12 


2.51x10-^ 


AXP 


-1.11059569 


-4.56x10-^ 


7.60x10-6 


PFE 


-0.69103252 


6.37x10-9 


6.02x10-6 


DIS 


-0.62361094 


-1.66x10-" 


1.08x10-5 


MSFT 


-0.23500530 


-4.61x10-11 


3.36x10-5 


KO 


-0.00633264 


8.28x10-12 


3.39x10-6 


PG 


0.29288976 


-2.68x10-5 


2.47x10-2 


GM 


0.29925834 


1.19x10-11 


5.08x10-6 


AIG 


0.30999620 


-2.73x10-11 


5.90x10-6 


MMM 


0.36915581 


-4.67x10-11 


5.32x10-6 


AA 


0.37469224 


1.30x10-12 


4.77x10-6 


HPQ 


0.64209911 


1.55x10-11 


6.24x10-6 


JNJ 


0.96565296 


-1.96x10-11 


5.54x10-6 


CAT 


0.98788045 


-2.20x10-11 


5.27x10-6 



23 



continued from previous page 



A«i/(Ji 



Mean /^ SE for /. 



MO 


1.18101555 


VZ 


1.30751627 


DD 


1.34345528 


BA 


1.41323601 


XOM 


1.41414287 


MRK 


1.41417225 



N 



INTC 0.99040135 -6.99xl0~i2 lssxIO'^ 

WMT 0.99093623 5.31x10"^ 1.53x10"^ 

C 1.05691370 1.41x10^1° 5.76x10"^ 

IBM 1.12634844 -1.08x10^^1 5.73x10"^ 



1.15x10-11 1.35x10-6 

2.03x10-^ 1.40x10-5 

-4.88x10-12 1.17x10-5 

-2.74x10-11 1.36x10-5 

-1.88x10-^ 5.08x10-3 

-9.05x10-' 7.72x10-3 



V. THE STANDARDIZED DRIFT AND ITS DISTRIBUTION 



Although the recursion relation, Eq. ( TT611 . has been successfully solved for all 24 stocks, 
we will delay until Sec. I VI I to present the solutions to this equation. Instead, we will first 
validate our model by showing that the stochastic process introduced in the previous sections 
solves the overly large kurtosis problem raised in Sec. IIVI In the process, we will find that 
the distribution of standardized yield is a generalized Rademacher distribution, and will 
show that the simple skewness and kurtosis agree with the values for population skewness 
and kurtosis for this distribution. By doing so, we will also have validated our model of 
stock market prices using the criteria outlined in Sec. IIIII As part of this process, we will 
be able to determine the drift of the yield as a function of time as well. 



A. Observed Properties of the Standardized Yield 

To complement the autoregression calculation for the daily yield shown in Fig. [H we have 
calculated autocorrelation function for the standardized daily yield, G^'^\Auj^^^, Au^^_m), 
for all 24 DJIA stocks. A plot of G^'^^Auj^^, Auj^i^^^m) as a function of T = aM has the 
same shape as that shown in Fig. [H but with G^'^\Aunt, Aunt) — ^ ^^^ ^^^ the stocks 
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instead of a range of values. Like the daily yield, the standardized daily yield on any one 
day is not correlated with the standardized yield on any other day; this is to be expected 
if the volatility is a function of time only. We have also determined the number of trading 
days for which the value of the autocorrelation function falls within the 95% and 99% CI 
of zero. The results are shown in Table I, and for for all but one stock, the results as as 
expected. On 5% of the trading days, the value of the autocorrelation function exceeds the 
95% CI, and on 1% of the trading days, the value exceeds the 99% CI. The only exception 
is Microsoft at the 95% CL when on 6% of the trading days the autocorrelation function 
exceeds the 95% CI of zero. 

In Sec. \IV\ we noted that on at least 1% of the trading days the value of the autocorrelation 
function for the daily yield exceeds either the 95% or 99% CI of zero, and we can say with 
a degree of statistical certainty that for these days, the autocorrelation function does not 
vanish. With the exception of Microsoft at the 95% CL, such days are not found in the 
autocorrelation function of the standardized yields. Since the standardized yield is obtained 
from the yield by removing the time- dependent volatility, we conclude from the results in 
Table I for the standardized yield that, with the possible exception of Microsoft, this 1% is 
not due to correlations in the daily yield, but rather to temporal variations in the volatility. 

Next, shown in Fig. |2^ is a plot of the daily yield with respect to trading day for Coca 
Cola. In comparison. Fig. |2]3 is the is the plot of the standardized daily yield for the stock 
over the same period. It is readily apparent that instead of taking a range of values between 
±0.3 as the daily yield does, the standardized yield jumps between two values, one near 
+1 and one near —1. Notice also that while the standardized yield is not precisely +1 or 
— 1, any changes in the standardized yield near +1 are accompanied by the same variations 
of the yield near —1; the variations in the standardized yield near +1 and near —1 would 
seem to move up or down in unison. Indeed, using a 251-day moving average, we find that 
the average of the difference in the value of the standardized daily yield. An , near +1 and 
its value. An , near —1 ranges from a minimum of (A^'^'^ — A^^^)/2 = 0.9934 ± 0.0061 to a 
maximum of (A(+) - A^-^)/2 = 1.0083 ± 0.0068; both are within the 95% CI of one. 

This binomial behavior for the yield is not surprising for the same reasons that binomial 
trees are effective at pricing options. As noted by Cox and Ross (1976), a continuous 
stochastic process with constant volatility can be approximated as a discrete random walk 
where at each time step, na, there is a probability, p, that the stock price will increase at the 
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next time step, and a probability 1 — p that it will decrease. The discrete stochastic process 
can thus be approximated by a binomial distribution, and as the binomial distribution is 
known to approach the Gaussian distribution in the large n limit, the discrete random walk 
approaches a continuous stochastic process for the stock price. Indeed, this limit is the 
reason why binomial trees are effective in the first place. 

B. Determining the Drift for the Standardized Yield 

In this subsection, we will determine the drift of the standardized daily yield as a func- 
tion of time. We do so by noting that it is apparent from Fig. |2] that the distribution of 
standardized daily yields is a binomial distribution. Thus, at each time step, n, there is a 
probability, p, that the standardized yield will increase by an amount An on that day, and 
probability, 1 — p, that it will decrease by an amount An . While in principle p may be 
different at different time steps, the fact that An and An' change in unison while keeping 
the average distance between position and negative standardized yields constant suggests 
that any variation in time is due to an overall shift in the distribution. Variations in An 
and An are not due to a time-dependent p, but rather to a drift for the standardized yield 
that changes with time. 

With this realization, the drift can easily be determined for all 24 stocks. From Eqs.([7]) 
and ( ITT]) , we can express the standardized daily yield as 

AUn = fln + ^n, (20) 

where /i„ = fi{an) is the discretized drift of the standardized yield, and ^^ is a random 
variable with zero mean and unit volatility such that i?ij[^^^^] = Snm- While for the 
continuous process ^r would be a Gaussian random variable, for the discrete process we will 
show that C,n is a random variable for the generalized Rademacher distribution described 
below. 

The standard way of calculating fin is to use a moving average over a window of M days. 
However, just like for the volatility, calculating /i„ with a moving average will mean that 
variations in the drift faster than M cannot be clearly seen. We will instead calculate fin 
directly from Aitn, which is possible to do because the distribution of standardized yield is 
so simple. 
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We first note that since changes to An and An is due to shifts in the distribution of 
standardized yield with time, these shifts must be due to the drift, /i„, of the standardized 
yield. Shifts in random variables are trivial changes to the distribution, however, and a drift 
that changes with time will not materially change the distribution of standardized yields. 

We next note that -E/?[^^] = 0. As the values of A-u„ lie close to ±1, we conclude that 
^^ can only take the values ±1. Any deviation by An from ±1 must be due to the drift. 
This drift can be determined by solving the equation 

fin = AUn - e^, (21) 

for fin by taking the sign of C,n to be the same as the sign of Aun- This solution is straightfor- 
wardly implemented, and the results for Coca Cola is shown in Fig. [2t. For clarity, we have 
only shown the values of the drift between ±0.15. While there are values that lie outside of 
this range, they occur in the first 10 time steps in the series, and are part of the transient 
behavior mentioned above. 

The drift for the standardized yield of all 24 stocks have be found using this approach. 
Not surprisingly, we find that 21 out of the 24 stocks have a drift that positive for the great 
majority of the time-series. What is surprising is that for three of the 24 stocks (Exxon- 
Mobil, Merck, and Proctor and Gamble) the drift of the yield of the stock is negative outside 
the transient region. For these stocks, the only reason why their price increases is due to 
the random walk, and because the volatility is so much greater than the drift. 

C. The Distribution of Standardized Daily Yields 

While in the last subsection we determined the drift of the daily yield, in this subsection 
we will show that the distribution of the standardized daily yield is a generalized Rademacher 
distribution shifted by the drift, fin- This will be done by comparing the skewness and 
kurtosis of the Rademacher distribution with the sample skewness and kurtosis for Am„ 
after the drift has been removed. We will show that the two agree at the 95% CL, and doing 
so will both determine the distribution and validate our model of stock market prices as a 
stochastic process with a time-dependent volatility. We begin by describing the properties 
of the generalized Rademacher distribution. 

A generalize Rademacher distribution consists of a random variable, ^^, that takes the 
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value +1 with probability p, and the value —1 with probability 1 — p. We denote the 
expectation value for this distribution as Er[-], and find that the population mean, mom[ = 
Er[X], is simply 

mom[ =p-{l-p) = {2p-l). (22) 

This vanishes for p = 1/2. The kth population moment, momk = ErK^^ — Er[^^])^], is 
easily calculated to be 

morrik = (2) V(l - p) [(1 - p)'"' + (-1)'/"'] • (23) 

The population variance is thus 

17101712 = 4p(l — p) (24) 

, while the population skewness is 

Skew = ]~'^^ , (25) 

vp(i-p) 

and the population kurtosis for the distribution is 

Kurt = ^~^^^^\ (26) 

p{l-p) 

Clearly, if p = 1/2, then ni2 = 1, Skew = 0, and Kurt = 1; this is the Rademacher 
distribution, which is a special case of the binomial distribution. When p ^ 1/2, we call this 
the generalized Rademacher distribution. 

Given the plot in Fig. [2b, we would expect that the distribution of standardized yields 
to be a Rademacher distribution with p = 1/2 at all time steps. To show that that this 
is the case, we have calculated the sample skewness and the kurtosis of the standardized 
yield after the drift, fin, has been removed from Am„. This has been done for all 24 stocks 
using the entire time-series for each. We then compared these sample skewness and kurtosis 
with the population skewness and kurtosis for the Rademacher distribution using the t-Test. 
For completeness, we have also calculated the probability, p, for each stock by counting the 
total number of Aun > 0, and compared it to the Rademacher value oi p = 1/2 using the 
chi-squared test. The results of these calculations and tests are given in Table III. We see 
that for all but four of the stocks the fit is exceedingly good; the skewness, the kurtosis, and 
the probability all agree at the 95% CL. 
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TABLE III: Model Validation 



Skewness 



Kurtosis 



Probability 



Mean 



t-Test 



Mean 



t-Test 



P 



X^ 



GM 


0.001 ±0.014 


0.07 


1.00028 ±0.00030 


DD 


0.002 ±0.014 


0.15 


1.00028 ±0.00031 


VZ 


-0.005 ±0.026 


0.18 


1.0011 ±0.0012 


DIS 


0.010 ±0.018 


0.54 


1.00058 ±0.00063 


AXP 


0.013 ±0.022 


0.58 


1.00086 ±0.00093 


MMM 


-0.011 ±0.016 


0.66 


1.00050 ±0.00054 


PG 


-0.009 ±0.014 


0.67 


1.00038 ±0.00041 


JNJ 


-0.012 ±0.016 


0.78 


1.00053 ±0.00056 


HPQ 


-0.015 ±0.019 


0.78 


1.00072 ±0.00078 


KO 


-0.018 ±0.013 


0.94 


1.00044 ±0.00046 


C 


-0.028 ±0.028 


0.98 


1.0019 ±0.0020 


BA 


0.019 ±0.015 


1.28 


1.00066 ±0.00064 


AIG 


-0.028 ±0.022 


1.32 


1.0015 ±0.0014 


INTC 


-0.029 ±0.022 


1.34 


1.0015 ±0.0015 


PFE 


-0.022 ±0.016 


1.42 


1.00087 ±0.00081 


CAT 


-0.020 ±0.014 


1.42 


1.00069 ±0.00064 


GE 


-0.020 ±0.014 


1.49 


1.00069 ±0.00063 


AA 


0.025 ±0.170 


1.50 


1.00107 ±0.00098 



0.92 0.500 0.01 

0.92 0.499 0.02 

0.92 0.501 0.03 

0.92 0.498 0.29 

0.92 0.497 0.34 

0.93 0.503 0.43 

0.93 0.502 0.45 

0.94 0.503 0.60 

0.94 0.504 0.61 

0.96 0.503 0.89 

0.97 0.507 0.96 

1.04 0.495 1.64 

1.04 0.507 1.73 

1.05 0.507 1.79 
1.07 0.506 2.01 
1.07 0.505 2.01 
1.10 0.505 2.21 
1.70 0.494 2.26 



MSFT -0.049 ± 0.028 

WMT -0.038 ± 0.022 

MRK -0.037 ±0.016 

IBM -0.037 ±0.014 



1.76 1.0035 ±0.0030 

1.77 1.0022 ±0.0018 
2.32 1.0018 ±0.0013 
2.68 1.0016 ±0.0010 



1.19 0.512 3.12 

1.19 0.510 3.13 

1.40 0.509 5.37 

1.55 0.509 7.21 



continued on next page 
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Skewness 


Kurtosis 


Probability 


Mean t-Test 


Mean t-Test 


P x' 


XOM -0.039 ±0.014 2.86 
MO -0.040 ±0.014 2.94 


1.0018 ±0.0011 1.63 

1.0019 ±0.0011 1.66 


0.510 8.20 
0.510 8.67 



Although this agreement is not as good for Altria, Exxon, IBM, and Merck, this is only 
because we were comparing it with the Rademacher distribution with p = 1/2. We find that 
the distribution of standardized yields for these stocks is instead the generalized Rademacher 
distribution with a p slightly greater than 1/2. Using Eq. (1221) and the sample mean for 
these stocks, we have solved for a probability, p, for the stocks. We find that this probability 
is in close agreement with those listed in Table III for these stocks, and when this p is then 
used in Eqs. ( 125|) and (!26|) to predict values for the skewness and kurtosis, the predicted 
values are now in agreement with the sample skewness and kurtosis at the 95% CL. 

We also note that variance of the distribution calculated using the values for p given in 
Table 4 in Eq. ( 124|) ranges from 0.9994 to 1.000. This is in excellent agreement with the 
requirement that the variance of the standardized daily yield is one when a is one trading 
day. 

We thus conclude that the distribution of standardized daily yields is a generalized 
Rademacher distribution shifted by the drift, fin- For 20 of these stocks, we find that 
p = 1/2 at the 95% CL. The probability that the daily yield increases is the same as the 
probability that it decreases. For the other four stocks, p is slightly greater than 1/2, and 
the probability that the daily yield increases is slightly larger than the probability that it 
decreases. 

VI. THE INSTANTANEOUS VOLATILITY 

Having determined the distribution for the standardized daily yield, we now turn our 
attention to determining the volatility of the stock. 

We find that while the recursion relation, Eq. ( TT6l) . is straightforwardly solved using the 
o"i given in Table II, there is a great deal of noise associated with the resultant values for o"„. 
This can be seen in Fig. |3^ where we have plotted as a function of trading day the volatility 
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obtained from Eq. (fT6|) . Although we can discern that there is an inherent structure in 
graph, this structure is buried within random fluctuations of cr„. These fluctuations are due 
to random noise generated when Eq. f ll6p is solved, and they mask the functional dependence 
of (J on t. In this section, we will extract this dependence from the noise. 

The presence of the noise in an is inherent, but not because a{t) itself obeys a stochastic 
process, as is assumed in stochastic volatility models. If it were, then there will necessarily 
be a second stochastic differential equation for a{t) to augment Eq. ([7]), and the two coupled 
equations would have to be solved simultaneously. Certainly, Eq. ([7]) and the recursion 
relation Eq. f lTB]) would not, in general, be solutions of the coupled stochastic differential 
equations, and it is this recursion relation that was used to obtain Fig. [3^. Rather, this 
noise is inherent in determining the volatility itself. 

Note from Eq. flT6|) that o"„ oc Aw^/a. For a stochastic process of the form Eq. ([3]) where 
the volatility changes with time, at each time step, an, Am„ is a random variable from a 
distribution with volatility o"„. As an need not equal am for any two n and m, each Am„ 
can come from a different distribution. In the worst case, we will have only one Aun out of 
any distribution with which to determine an, and this Am„ can take any value from — oo to 
+00 with a probability 



1 la 



PiAun/a) = ^/J^e-(^""/"-'^^) "/'"". (27) 

an V 2vr 

Determining cr„ would thus seem to be an impossible task. That it can nevertheless be 
done is due to three observations. First, because P{Aun/a) is Gaussian, there is a 68% 
probability that any value of Aun/a will be within yU„ ± an/ \/a. It is for this reason that it 
is still possible to discern an overall functional dependence of cr„ on the trading day through 
the noise in Fig. [31 Second, a{t) is a deterministic function of t, and thus the value of the 
volatility at time step an is related to its value at time step a{n — 1). Given a sufficient 
number of Aun — and thus a sufficient number of o"„ — it must be possible to construct a 
functional form for a{t). Third, using Fourier analysis (also called spectral analysis) and 
signal processing techniques, it is possible to remove from Fig. |3]the noise that is obscuring 
the details of how an depends on the trading day, and obtain a functional form for the 
volatility. 

That Fourier analysis provides an efficient way of removing the noise from Fig. [3] is based 
on the following theorem: 
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FIG. 3: The top figure shows the volatility for Coca Cola obtained from the recursion relation 
Eq. (J16p . The high degree of noise associated with this volatility can readily be seen. In Figs. [3)3 
and c, the Fourier sine and cosine components are graphed, and the floor of noise for both can 
readily be seen along with the points that are above the noise. From the insert in Fig. 3c, the 
similarity in the noise floors for the sine and cosine coefficients is apparent. 

Theorem: If {^„ : n = 1, . . . , N} is a time series where ^„ is a Gaussian random variable 
with zero mean, and volatility, a, then the Fourier sine, a|™, and Fourier cosine, a^^, 
coefficients of the Fourier transform of C,n are Gaussian random variables with zero mean 
and volatility a/yN . 

This theorem is well-known in signal analysis, and is an immediate consequence of Parse- 
val's Theorem. A proof of this theorem, as well as a review of the discrete Fourier transform, 
is given in Appendix O It is because the volatility of the Fourier sine and cosine coefficients 
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for Gaussian random variables are reduced by a factor of l/y/N that it is possible to remove 
from an the random noise. In general, this reduction in the coefficients does not occur if the 
^n are not random variables, and thus the structure in Fig. [3] can be resolved once the Fourier 
transform of cr„ is taken. After this removal is accomplished, we can then take the inverse 
Fourier transform to obtain o"(t), which we call the instantaneous volatility to differentiate 
it from the cr„ that comes directly from Eq. (IT6ll . 

Figure [3b and c are plots of the Fourier sine and Fourier cosine coefficients of the discrete 
Fourier transform of cr„ defined as 



1 >rA f2iTikn\ ^^3 1 "ST^ • f'27iikn\ 



They depend on an integer k, which runs from —{Nt — l)/2 to {Nt — l)/2. As 

the Fourier transform decomposes the time-series, 0"„, into components that oscillate with 
frequency fk = k/Nx day~^ (or, equivalently, with period N^/k days) for A; > 0; the 
coefficients 210™*^! and 2|a|^'^| are the amplitudes of these oscillations. 

In the graphs shown in Figs. [3b ancJHb, we can readily see that there is a component of 
the Fourier coefficients for Coca Cola that varies randomly between ±0.0002. This is the 
noise floor. Coefficients in this floor are the result of the Fourier transform of the noise that 
mask the functional behavior of cr.„ on the trading. This noise floor is similar for both the 
Fourier sine and cosine coefficients, as can be see in detail in inset plot in Fig. [3t where the 
features of the plot of a'^^ are magnified for k between ±1000. 

It is also apparent from the graph that there are Fourier coefficients that rise above the 
noise. While the most prominent of these is Oq"*^ (which is the average of x„ over all trading 
days), such points exist for other coefficients as well. This is due to the structure in an 
shown in Fig. [3^; if there were no structure at all in the plot, then there would be no Fourier 
coefficients that rise above the noise floor. 

By combining this observation with the near uniformity of the noise floor, we are able to 
filter out the noise component of o"„, and construct a (approximately) noise-free instanta- 
neous volatility, a{t). A description of the process that we used, along with the statistical 
criteria used to determine the noise floor for the Fourier sine and cosine coefficients, is given 
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The Instantaneous Volatility, cr(f), After Noise Removal 

FIG. 4: In the main figure, the instantaneous volatility Coca Cola and the historical volatility the 
stock calculated with a 251-day moving average is plotted. The historical volatility consistently 
overestimates the instantaneous volatility. The degree of this overestimation, along with the details 
that the moving average misses, can be readily seen in the inset graph where both volatilities are 
plotted over a one- year span from December 2, 2004 to December 01, 2005. 



in detail in Appendix IC3I The effectiveness of the noise removal process can be seen in 
Fig. m where a plot of the instantaneous volatility for Coca Cola is shown. When this plot 
is compared to Fig. [3^, the amount of noise removed, and the success of the noise removal 
procedure, is readily apparent. Indeed, out of a total of 21,523 Fourier sine and cosine co- 
efficients for an, 10,734 Fourier sine and 10,728 of Fourier cosine coefficients were removed 
as noise; only 59 points were kept to construct cr(t). While the graph of a{t) may appear 
to be noisy, this is because eight decades of trading days are plotted in the figure. Much of 
this apparent noise disappears when the range of trading days plotted is narrowed, as can 
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be seen in the inset figure. Here, the instantaneous volatihty over a one-year period from 
December 29, 2005 to December 29, 2006 has been plotted. 

To compare the instantaneous volatility with the historical volatility, we have included in 
Fig. HJa graph of historical volatility calculated from the daily yield using a 251-day moving 
average. It is immediately apparent that the historical volatility is generally larger than the 
instantaneous volatility; at times it is dramatically so. It is also readily apparent that the 
historical volatility does not show nearly as much detail as the instantaneous volatility, as 
can be seen in the inset figure. 

A functional form for a{t) can be found for all 24 stocks. For Coca Cola, this expression 
has 59 terms; we give only four of them here, 



a{t) = 0.00950 



1 - 0.25126 sin(27r/ot) + 0.12670 cos(27r/ot) 



+,..., + 55 terms +,..., + 0.03870 cos(8735[27r/o]t) 



(30) 



where /o = 1/21522 rad/day is the fundamental angular frequency. The amplitude of the 
first term in the expression is the largest; it is the average of o"„ over all the trading days 
in the time-series. The second largest amplitude is the sine term in Eq. (1501) . and it is 25% 
the size of the first. All other amplitudes are smaller then this term, for most by a factor of 
5, and yet notice from Fig. |4] that these amplitudes are nonetheless sufficient to generate a 
instantaneous volatility that is far from a constant function. 

From the last term in Eq. (l30l) . we see the that shortest frequency of oscillations that 
make up a{t) is 8735/21522 ^ 0.4 day^^. This is very close to the Nyquist criteria of 0.5 
day~^ for cr(t), which is the upper limit on the frequencies of the Fourier components of 
cr{t). The underlying reason for such a limit is because the the original time-series, S'„, was 
acquired once each trading day. We therefore cannot measure oscillations with a period 
shorter than two trading days; there simply is not enough information about the stocks to 
determine what happens within the trading day. (This is in contrast to predicting how the 
volatility may behave during the trading day, which certainly can be done.) For each of the 
24 stocks, the shortest period of the Fourier components that make up the instantaneous 
volatility are listed in Table III, and we see that for all but 3 of the stocks our expression 
for a{t) comes very close to Nyquist criteria. In the case of Alcoa, Caterpillar, and Johnson 
& Johnson, the shortest period has even reached it. 
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TABLE IV: Effectiveness of Noise Filtering Routine 







af ^ Noise 






a^°" Noise 






Period 


Floor 


Kurtosis 


Floor 


Skewness 


Kurtosis 




(days) 


X SD 




X SD 






CAT 


2.0 


3.33 


3.07 


3.36 


-0.04 


3.04 


JNJ 


2.0 


3.20 


2.99 


3.23 


-0.03 


2.99 


AA 


2.0 


3.40 


3.00 


3.47 


-0.02 


3.00 


GE 


2.1 


3.47 


3.01 


3.44 


-0.04 


3.00 


HPQ 


2.1 


3.45 


3.00 


3.31 


-0.08 


3.01 


DIS 


2.1 


3.42 


2.99 


3.64 


-0.03 


3.03 


XOM 


2.2 


3.79 


3.11 


3.61 


-0.05 


3.03 


MSFT 


2.3 


3.51 


2.99 


3.27 


-0.08 


3.00 


PFE 


2.3 


3.38 


3.00 


3.40 


-0.05 


3.00 


AXP 


2.4 


3.56 


3.17 


3.33 


-0.04 


3.14 


MRK 


2.4 


3.33 


3.03 


3.95 


-0.03 


3.00 


INTO 


2.4 


3.26 


3.00 


3.36 


-0.05 


2.99 


WMT 


2.5 


3.47 


2.96 


3.29 


0.11 


2.99 


KO 


2.5 


3.89 


3.30 


3.53 


0.02 


3.23 


BA 


2.7 


3.37 


3.00 


3.30 


-0.01 


2.99 


PG 


3.2 


3.31 


2.95 


3.40 


-0.04 


3.00 


AIG 


3.2 


3.44 


3.00 


3.32 


0.01 


3.00 


MMM 


3.5 


3.03 


2.54 


3.06 


-0.04 


2.58 


IBM 


4.0 


3.41 


3.00 


3.64 


0.03 


3.00 


VZ 


5.2 


3.28 


2.99 


3.23 


-0.09 


3.00 


MO 


5.5 


3.56 


3.14 


3.64 


-0.07 


3.21 
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al"" Noise 






a^°" Noise 






Period 


Floor 


Kurtosis 


Floor 


Skewness 


Kurtosis 




(days) 


X SD 




X SD 






c 


6.2 


3.36 


3.02 


3.33 


0.05 


3.12 


DD 


9.6 


3.81 


3.31 


3.64 


0.00 


3.12 


GM 


14.2 


3.59 


3.00 


3.64 


-0.06 


3.00 



In Figs. [5] and El we have graphed the instantaneous volatility as a function of trading 
day for all 24 stocks. They have been ordered into graphs where the degree of volatility 
are similar, with the stocks with roughly the highest volatility graphed last. Analytical 
expressions for the other 23 stocks are not given as they are too lengthy. 

With a{t) now known and the drift for the standardized daily yield obtained previously, 
the drift for the daily yield, /i„ = ^{na), can be found for all 24 stocks using the discretized 
version of Eq. ([8]), /i„, = cT{na)jln, where the a{t) is the instantaneous volatility obtained 
above. Since |/i„| < 1 for all 24 stocks, |/i„| < a{na). Thus for all of the 24 stocks, the 
drift of the stock is smaller than the volatility of it. This is to be expected. If the drift of 
a stock is larger than the volatility, then future trends in the stock can be predicted with 
a certain degree of certainty; the drift is, after all, a deterministic function of time. Such 
trends could be seen by investors, and nearly riskless profits could be made. This clearly 
does not happen. It is instead very difficult to discern future trends in the price of stocks, 
and this is precisely because the volatility of the stock is so large. 

VII. CONCLUDING REMARKS 

As a continuous process, we have found that the 24 DJIA stocks can be described as a 
stochastic process with a volatility that changes deterministically with time. It is a process 
for which the autocorrelation function of the yield vanishes at different times, and thus one 
that describes a stock whose price is efficiently priced. From the results of our calculation of 
the autocorrelation function of the daily yield for the 24 stocks, this property of our stochastic 
process is in very good agreement with how these stocks are priced by the market. It is also 
a process for which the solution of the stochastic differential can be, at least formally, 
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The Instantaneous Volatility for the DJIA Stocks, I 



FIG. 5: Graphs of the instantaneous volatilities verses trading day for the 12 of the 24 DJIA stocks 
with the lowest peak volatility are shown. 

solved. This solution is valid only because the volatility is a deterministic function of time, 
however. If the volatility is depends on the stock price, or if the volatility itself is a stochastic 
process, the solution of the stochastic differential equation will not be so simple, and the 
autocorrelation function need not vanish at different times. 

It is, however, only after using the discretized stochastic process that we are able to 
validate our model. After correcting for the variability of the volatility by using the stan- 
dardized daily yield, we have shown that for all 24 stocks the distribution of standardized 
daily yields is well described by the general Rademacher distribution. Indeed, we found 
that the abnormally large kurtosis is due to a volatility that changes with time. For 20 
of the 24 stocks, the sample skewness, kurtosis, and probability distribution agrees with a 
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FIG. 6: Graphs of the instantaneous volatilities verses trading day for the 12 of the 24 DJIA stocks 
with the highest peak volatility are shown. 

Rademacher distribution where p = 1/2 at the 95% CL; the probability that these stocks 
will increase on any one day is thus equal to the probability that it will decrease. The other 
four stocks agree with a generalized Rademacher distribution and have a p slightly greater 
than 1/2. For these stocks, the probability that the yield will increase on any one day is 
slightly higher than the probability that it will decrease. We conclude that our model is a 
very good description of the behavior of these stocks. 

That the kurtosis for the standardized daily yield is smaller than the kurtosis for the 
daily yield is in agreement with the results found by Rosenberg (1972). The daily yield is 
time dependent, and is thus a nonstationary random variable, while for the standardized 
daily yield, the time dependence due to the volatility has been taken account of. Indeed, in 
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many ways Rosenberg (1972) presages the results of this work. 

By combining the properties of our continuous stochastic process for the stocks with 
noise removal techniques, we have been able to determine the time dependence of both 
the volatility and the drift of all 24 stocks. Unlike the implied volatility, the volatility 
obtained here was obtained from the daily close directly without the need to fit parameters 
to the market price of options. The theory is thus self-contained. For Alcoa, Caterpillar, 
and Johnson&Johnson, the time dependence of the volatility can be determined down to a 
resolution of a single trading day, while for another 13 stocks, they can be determined to a 
resolution of less than 11/2 trading days. While other, more sophisticated signal analysis 
techniques can be used, given that the time-series is based on the daily close and thus the 
resolution is ultimately limited to a period of two trading days, we do not expect that it 
will be possible to dramatically improve on these results. Only when intraday price data is 
used will we expect significant improvement to this resolution. Indeed, with intraday data 
we expect that changes to the volatility that occur during the trading day can be seen. 

We have deliberately used large cap stocks in our analysis, and we take care to note that 
this approach to the analysis of the temporal behavior of stocks have only been shown to be 
valid for the 24 stocks we analyzed here. While we would expect it to be applicable to other 
large-cap stocks, whether our approach will also be valid when applied to mid- or small-cap 
stocks is still an open question. Indeed, it will be interesting to see the range of stocks for 
which the volatility depends solely on time. 

With both the drift and the volatility determined down nearly to the single trading day 
level for most of the stocks, it is now be possible to calculate the autocorrelation function for 
both, as will as the correlation function between the drift and the volatility. In particular, 
the degree of infiuence that the volatility or drift on any one day has on the volatility or 
drift on any future day can be determined. This analysis is currently being done. 

Appendix A: Preparing the Time-series 

The time-series for the 24 DJIA stocks analyzed here were obtained from the Center for 
Research in Stock Prices (CRSP). While the ending date for each series is December 29, 
2006, the choice of the starting date is often different for different stocks. This choice of 
starting dates was not governed by a desire for uniformity, but rather by the desire to include 
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as many trading days in the time series as possible, and thereby minimize standard errors. 
In addition, by maximizing the number of trading days included, we also demonstrate that 
our model is valid over the entire period for which the prices of the stock are available. 

Although the daily close of stocks between the starting and ending dates are used as the 
basis of the time-series (with dividends included in the price), a series of adjustments to 
the CRSP data were made when the series were constructed. If the closing price of stock is 
listed by CRSP as a negative number — an indication that the closing price was not available 
on that day, and the average of the last bid and ask prices was used instead — we took the 
positive value of this number as the daily close on that day. If no record of the daily close 
was given for a particular trading day at all — an indication that the bidding and asking 
prices were also not available for that date — we used the average of the closing price of the 
stock on the day preceding and the day following as the daily close for that day. 

Next, the daily close of the stock prices were scaled to adjust for splits in the stock. For 
example, although the daily close for Coca Cola on December 12, 1925 is listed by CRSP as 
$153,625, this price was scaled by a factor of 6745.134 to account for the accumulated splits 
that the stock has gone through since 1925. The price recorded in the time-series is instead 
0.022776. Because of this scale factor, the prices of stocks are listed in all time-series to 
an accuracy of at least 10~^ to ensure that the daily close on any day can be reconstructed 
from the time-series. This level of accuracy or higher was then used in all the calculations 
in this paper. While we could have avoided this subtlety by scaling the daily close, $48.25, 
of the stock on December, 29, 2006 by 6745.134, doing so would result in stock prices that 
are ~$300K, which is deceivingly large. 

Finally, from Eq. ( IT6|) we see that cr„ vanishes if Am„ vanishes, and yet from Eq. ( !T5|l . it 
is implicit that the quotient Au^/cr^ must be well defined. Indeed, the reduction of Eq. ([3]) 
to Eq. ([7]) is only valid if a{t) is nowhere zero. In practice, there are trading days on which 
the daily yield vanishes; for Coca Cola, this occurred 2070 out of a total of 21,523 trading 
days. To ensure that Au„/cr„ is well defined on these days, we have added to the daily close 
a random number less than 0.00005 if the close on successive days are equal. This is done 
before the daily close is scaled to adjust for stock splits. Since the stock price changes by 
at least $0.01 increments, doing so does not materially change the stock price, while still 
insuring that Am„, ^ 0. 

We have not adjusted for inflation in our time-series, nor have we accounted for week- 
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ends, holidays, or any other days on which trading did not take place. We have instead 
concatenated the daily close on each trading day, one after another, when constructing the 
time-series. The time-series are thus a sequence of trading days, and not calendar days. 
While this concatenation is natural, issues of bias such as those studied by Fleming, Kirby, 
and Ostdiek (2006) have not been taken into account. Whether these issues are relevant 
for the stocks considered here we leave for further study. Our focus is instead on the gross 
features of the stock price. 

Finally, we list here the following particularities that occurred in our analysis of the 24 
DJIA stocks. 

Boeing: When solving for o"„, the n = 2 term was greater than 27, while all other terms 
was less than 0.5. This data point was an outlier, and since it is in the transient region for 
the stock, we have set this term equal to 0.1, which is the typical size of o"„ for n ^ 2. 

Merck: When solving for o"„, the n = 2 term was greater than 168, while all other terms 
were three orders of magnitude smaller. This data point was replaced by 0.2, which is the 
typical size of o"„ for n ^ 2. 

Exxon-Mobil: When solving for o"„, the n = 2 term was greater than 158, while the n = 3 
term was greater than 1500. The n = 2 data point was replaced by 0.1, and the n = 3 data 
point was replaced by 0.009. 

Appendix B: Statistics 

In this section, we collect the expressions used here in calculating the mean, variance, 
skewness, kurtosis, and autocorrelation of the time-series, along with their respective stan- 
dard errors. With the exception of the autocorrelation function, these expressions are taken 
from Stuart and Ord (1994). 
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1. Moments and Standard Errors 

Given a collection of A^ data points, Xn, the sample moments, ruk, of order, k, that are 
used in our analysis are defined as follows 
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As usual, the sample skewness and kurtosis are defined as 



While the standard error of the mean and the variance is well known 



skew = — TTTT, kurt = — ^. (B2) 

.i/2 jj^Z 



the standard error in the sample skewness and kurtosis are not. For the skewness, this error 
is 

Sskew = ^\^-6^,+9 + ]4(9^,+35]-'-^^Y\ (B4) 

y/N {'^2 1712 4:7712 \ 7712 J "^2 J 

while for the sample kurtosis, the standard error is 
5A:«rt=^fe-4^ + 4f^V-f!^V + 16^-8^ + 164|(B5) 

^/]\[ I 1712 "^2 \"^2/ \"^2/ '^2 ^2 "^2 I 

Although standard errors are defined in terms of the population moments, these moments 
are not known a priori. Following Stuart and Ord (1994), we have used instead the sample 
moments listed in Eq. (IBip when calculating standard errors. 

43 



2. The Autocorrelation Function 

For the time-series, x„, where n = 1, . . . , A^, we define the autocorrelation of xn to be 

I N / 1 ^ 

i=M+l \ k=M+l 

^^-M - ^ _^ 5Z Xk^M . (B6) 

fc=A/+l / 

Equation ( JB6I) measures the correlation of the time-series at time step iV with the time-series 
at time step A^ — M . This definition differs somewhat from the one given in Kendall (1953) 
and in Kendall, Stuart and Ord (1983) in that they divide G'-^''(xAr,XAr_M) by the product 
of the volatility of the time series, {x^ : n = M + 1, . . . , N} with the volatility of of the 
time-series, {xn-M : n = M + 1, . . . , A^}. It also differs substantially from the expression 
used in Alexander (2001), where a simplified expression for the autocorrelation function in 
Kendall, Stuart, and Ord (1983) is used. 

We use Eq. ( JB6I) instead of the expressions given in Kendall, Stuart, and Ord (1983) and 
Alexander (2001) for two reasons. First, G^'^\xn,xn) is simply the variance of the time- 
series, so that the volatility for a stock can be read off easily from its graph, as can be seen 
in Fig. [H Second, we will see below that the variance for G^'^\xn,xn-m) is easily calculated 
when Xn are Gaussian random variables, and the standard error for G^{xn,xn-m) can 
be readily determined. Derivations of the standard error for the autocorrelation functions 
given in Kendall, Stuart, and Ord (1983) and Alexander (2001), on the other hand, are more 
involved. 

To determine the standard error for G^'^\xn,xn^m), consider a time-series where the Xn 
are Gaussian random variables with mean zero and standard deviation, a^. Then E[xi] = 
while E[xiXk] = cr'^Sik- (Here, 6ik is the Kronecker delta with ^j^ = 1 if ^ = ^ while 
5ik = otherwise.) Consequently, E [G^'^\xn, Xn-M)] = when M > 0, as can be seen from 
Eq. (]B6p . We thus only have to calculate 

N N 

r , ,,N .91 I 

E 
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Since the x„ are Gaussian random variables, 



{G^'^\xn,Xn-m)} = (AT _ /M-N2 XI 5Z E[XiXi-MXkXk-M]- (B7) 
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Thus, 

4 ^ 

(B9) 



^ ' i,k=M+l 



The first term vanishes since M > 0, while the third term vanishes because it requires that 
i = k — M and i — M = k; this can only happen when M = 0. We are thus left with only 
the second term, so that 

,4 



E 



{G^'\xr,,XN.M)Y] = j^^. (BIO) 



The standard error, /\G^'^\xis[,xm-m)^ for G'^'^\xm,xm-m) when M > is then simply 

AG(^)(x.,x._m) = ^^^^, (Bll) 

where we have used the fact that G*-^-* (xat, xn) is the variance of the time-series. The standard 
error for G^'^\xn,Xn-m) when M = can then be found from Eq. (1B3I) after remembering 
that ?7i4 = 3o"'^ for a Gaussian distribution. Note that differences between Eq. (IBlip and 



the standard error found in Kendall, Stuart, and Ord (1983) are due mainly to our defining 
G^'^\xj\f,XN-M) with the factor 1/{N — M) instead of the factor 1/A^ used by them. 

Although the standard error for G^'^\xniXn-m) when x„ is not a Gaussian random 
variable can be found for special cases (see Kendall, Stuart, and Ord 1983), Eq. (IB 111) is 
sufficient for our purposes. If the market is efficient, we expect the autocorrelation function 
for the standard yield to vanish for M > 0. As this expectation is borne out by Fig. [H we 
hypothesize that the reason why the autocorrelation function in Fig. [1] is not identically zero 
when T > is due to sample errors, which in turn is due to Gaussian random variables with 
zero mean. We would therefore expect Eq. ( IBllI) to be a good description of the standard 
error of this autocorrelation function, and indeed, this expectation is consistent with the 
results listed in Table I. 

Appendix C: Fourier Analysis 

In this appendix, we review the properties of the Fourier transform needed in the analysis 
we present in this paper. While much of this is well-known, our purpose here is to establish 
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the notation used in the paper, and to review the properties of Fourier series needed. At 
the end of this section, we will also show that the Fourier transform of a Gaussian random 
variable is once again a Gaussian random variable, and describe the method used to remove 
the noise from cr„. 

1. The Discrete Fourier Transform 

Consider a times-series, Xn, such that n = Nmm, ■ ■ ■ , N^ax] the total number of data 
points in the time-series is then A^ = N^ax — Nmin + 1- In the analysis below, we will assume 
that A^ is an odd number. As there are at least 5,000 trading days in our time-series, we 
can always change the starting point of a time-series by one trading day to insure that there 
are an odd number of terms in the series; this assumption is thus not restrictive. 

The expansion of the time-series in a Fourier series is defined as 

(Af-l)/2 

x^= Y^ x^e-'"*'^"/^, (CI) 

fc=-(Af-l)/2 

where i = a/— 1. The quantity, x'^, is called the Fourier transform of a;„. As 



2Txikn\ f2Tikn\ f2Tikn\ 

exp I — — = cos — -— — 2 sin 



N I -^K N I --[ — h (^2) 



in taking the Fourier series of x„ we have decomposed Xn into terms that oscillate with 
definite period, T^ = N/k, for A; > 0, and have a definite amplitude, \x'^\. This transform is 
thus a natural method of characterizing how a time-series changes with time. 

The amplitude of these oscillations, x^, is a complex number in general. The original 
time-series, x^, is real, however, and this fact must also be reflected in x^. How it is reflected 
can be seen by taking the complex conjugate of Eq. (ICip . 

(N-l)/2 

x^= Y^ x^e'"*'^"/^, (C3) 

k=-{N-l)/2 



where the complex conjugate is denoted by a bar. Since x„ = x„, by comparing Eq. (ICll) 
with Eq. ( 1C3I) we find after taking k -^ —k in Eq. ( ICll) the reality condition x^ = x'^j^ that 
the Fourier transform must satisfy. 

The transform Eq. f ICip is invertible. Namely, we can express x^ in terms of x„ by taking 
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the following sum 
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of Eq. (ICip . To evaluate this sum, we consider first the case where k — k ^ qN for any 
integer q. The series on the right can then be summed to give 
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after using the following identity for the geometric series, 

n=0 ^ 

As e^'^*^^"^) = 1, while e^'"'>-i^~^)l^ 7^ 1, we conclude that Eq. (ICSp vanishes in this case. We 
next consider the case when k — k = qN. Each term in the sum is then one, and Eq. (ICSP 
is easily summed to give N. 

Combining these two results, we find that 
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We then conclude from Eq. flC4p that 
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This is the inverse Fourier transform of x„. In particular, notice that when A; = 0, 

Tl — iytnin 

is simply the average of x„ over the whole time-series. 

The Fourier series Eq. (ICip can also be expressed as an explicitly real expansion, 
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are called the Fourier cosine and Fourier sine coefficients, respectively. While analytical 
calculations are more easily done with Eq. (ICip . numerical calculations are necessarily done 
with Eq. (IClOp . and it is on the Fourier sine and cosine coefficients that we will focus most 
of our analysis in this paper. 

2. Fourier Transforms of Gaussian Random Variables 



We now prove the theorem stated in Sec. IVII for the special case when A^ is an odd 
number. Although the theorem holds in general, this is the only case we need here. 

Because each ^„ in the time-series is a Gaussian random variable with zero mean and 
variance, a"^, the probability distribution for the time-series is just 



a^\27rJ J- J- a^ \27iJ 

n=l 
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where a is the time interval between successive points in the time series, and in the last 
equality we have used E[^n^m] = for n 7^ m. Expanding ^„ in a Fourier series using 
Eq. dCl]), we find that 
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where the last equality holds from Eq. (1C7I) . Thus, 
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which is Parseval's Theorem for a discrete Fourier series. Following Eq. ( IClip . we express 
|^^|2 = (ct^m^2 ^ ^^^„3^2 .^ ^^ (IcTijl . Then Eq. (ICT2ll can be written as 
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fc=-(Af-l)/2 

and the theorem is proved. 

It is straightforward to see that the converse is also true. Namely, if a'f^^ and a|™ are 
Gaussian random variables with zero mean and volatility a/vN, then Xn is a Gaussian 
random variable with zero mean and volatility, a. 
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Notice from Eq. (ICISP that i?[a™**Q;|'°] = 0, and thus the two random variables are 



independent. Notice also that while we began with N degrees of freedom with the random 
variables, x„, we seem to have ended up with 2N — 1 degrees of freedom for the random 
variables, 0;^°'^ and a^*". From Eq. (IClip we see, however, that 0;^°*^ = a^°| and a;|.™ = a'!^^; 
not all the variables in Eq. (IClSp are independent. When this redundancy is taken account 
of, we arrive back to A^ degrees of freedom. 

3. Removal of Noise 

In this subsection, we will describe how the noise present in an is removed, and how a{t) 
is obtained. 

Given that the noise floor associated with the Fourier sine and cosine coefficients is 
constant over all frequencies, fk, noise removal is straight forward. We need only remove 
from J-"*^™, the set of all Fourier sine coefficients for an, and J^^°^, the set of all Fourier cosine 
coefficients for o"„, those coefficients whose amplitudes is less than the amplitudes, Snoise and 
Cnoise, of the uoisc floor for the Fourier sine and Fourier cosine coefficients, respectively. The 
coefficients left over — J^sl^nai — Wt^ ^ -^^^^ '■ Wt^l > Snoise} for the Fourier sine coefficients, 
and J^signai = {(^T^ ^ •^™'^ • I'^T^l > Cnoise} for the Fourier cosine coefficients — can then be 
used to construct the instantaneous volatility, a(t), by summing the Fourier series Eq. ( l29ll . 

The noise floor amplitudes, Snoise and Cnoise, are determined statistically. Consider the 
set of coefficients that are removed: J-'noise = Wt^ ^ •^'^'" ■ \^k^\ — Snoise} for the Fourier 
sine coefficients and J^nTise — i^T^ ^ -^"^"^ '■ WT^l — Cnoise} for the Fourier cosine coefficients. 
Because the distribution of the noise floor is Gaussian, Snoise and Cnoise must be chosen 
so that the distributions of coefficients in J-'noise ^^^ J^nTise ^^^ Gaussian as well. If either 
amplitude is chosen too large, then coefficients from J-"*^™ or J^^°^ that make up the signal, 
a(t), would be included in the noise distributions as noise. As these coefficients are supposed 
to be above the noise, they will skew and flatten the distribution; the skewness and the 
kurtosis for the distribution of J^noise ^^^ ^^ -^noise ^^^ then differ from their Gaussian 
values if these coefficients are included. On the other hand, if either amplitude for the 
noise floor is chosen too small, then coefficients from J-'*'™ or J^'^°^ that make up the noise 
would be excluded from the noise distributions. As these coefficients would have populated 
the tails of the Gaussian distribution, their removal will tend to narrow the distribution. 
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and the kurtosis of the noise distributions will differ once again from its Gaussian value. 
(Because the coefficients are remove symmetrically about the horizontal zero line, a choice 
of the amplitude for the noise floor that is too small will not tend to change the skewness 
significantly.) Thus, Snoise and Cnoise must be chosen so that the skewness and kurtosis of 
the distribution of coefficients in J^noise ^^^ -^noise i^ ^^ close to their Gaussian distribution 
values as possible. 

While the above procedure is straightforward, there is an additional constraint. The 
volatility cannot be negative, and thus the resultant instantaneous volatility, o'(t), obtained 
after the noise floor is removed must the positive as well. This constraint is not trivial. For 
a number of stocks, a choice of Snoise and Cnoise that results in noise distributions that are 
closest to a Gaussian distribution also results in a cr(t) that is negative on certain days. To 
obtain a a{t) that is non- negative, slightly larger amplitudes for the noise floors were chosen, 
which resulted in a slightly larger skewness and kurtosis. 

This approach to removing the noise from the volatility, o"„, has been successfully applied 
to all 24 stocks using a simple C+H- program that implements an iterative search algorithm 
to determine Snoise and Cnoise- The results of our numerical analysis are shown in Table 
IV. There, we have listed the noise floor amplitudes, Snoise and Cnoise, used for each of the 
24 stocks. Their values are given as multiples of the standard deviation of the distribution 
of the Fourier coefficients in J-'noise ^^^ -^noise- ^^ these values range from 3.030 times the 
standard deviation to 3.948 times the standard deviation, 99.756% to 99.992% of the data 
points that make up a Gaussian distribution can be included in these distributions if they 
are present in either J^^'^^^^^ or J^^°l^. 

Listed also in Table IV are the kurtosis for J^noise and J^nTise- We have found that they 
range in value from 2.95 to 3.31, and are thus very close to the Gaussian distribution value of 
three for the kurtosis. The skewness of the noise of the distribution of J^nTise ^^^ calculated 
as well, and was found to vary in value from -0.03 to 0.11; this also is very close to the 
Gaussian distribution value of zero for the skewness. The skewness for the distribution of 
J-'noise *as also calculated, but we find that their values are 10^^ to 10^^ times smaller than 
the skewness for the distribution of J^n'oise, and there was no need to listed these values in 
the table. This extremely close agreement with the skewness of the Gaussian distribution 
is because the Fourier sine coefficients are antisymmetric about A; = 0: a|'° = —a^l^- The 
average of any odd power of a|'° over k — and in particular, the skewness of the distribution 
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of J-"**™ — thus automatically vanishes. For this reason, the skewness for J^noise i^ exceedingly 
small. 
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